Data Mining with Shallow vs. Linguistic Features to Study Diversification of Scientific Registers
نویسندگان
چکیده
We present a methodology to analyze the linguistic evolution of scientific registers with data mining techniques, comparing the insights gained from shallow vs. linguistic features. The focus is on selected scientific disciplines at the boundaries to computer science (computational linguistics, bioinformatics, digital construction, microelectronics). The data basis is the English Scientific Text Corpus (SCITEX) which covers a time range of roughly thirty years (1970/80s to early 2000s) (Degaetano-Ortlieb et al., 2013; Teich and Fankhauser, 2010). In particular, we investigate the diversification of scientific registers over time. Our theoretical basis is Systemic Functional Linguistics (SFL) and its specific incarnation of register theory (Halliday and Hasan, 1985). In terms of methods, we combine corpus-based methods of feature extraction and data mining techniques.
منابع مشابه
Scientific registers and disciplinary diversification: a comparable corpus approach
We present a study on linguistic contrast and commonality in English scientific discourse on the basis of a monolingually comparable corpus. The focus is on selected scientific disciplines at the boundaries to computer science (computational linguistics, bioinformatics, digital construction, microelectronics). The data basis is the English Scientific Text Corpus (SCITEX) which covers a time ran...
متن کاملFeature Discovery for Diachronic Register Analysis: a Semi-Automatic Approach
In this paper, we present corpus-based procedures to semi-automatically discover features relevant for the study of recent language change in scientific registers. First, linguistic features potentially adherent to recent language change are extracted from the SciTex Corpus. Second, features are assessed for their relevance for the study of recent language change in scientific registers by mean...
متن کاملEvaluation of the nutritional effects of fasting on cardiovascular diseases, using fuzzy data mining
Background: Advances in information technology and data collection methods have enabled high-speed collection and storage of huge amounts of data. Data mining can be used to derive laws from large data volumes and their characteristics. Similarly, fuzzy logic by facilitating the understanding of events is considered a suitable complement to scientific data mining. Materials and Methods: The pre...
متن کاملCross-Linguistic Transfer or Target Language Proficiency: Writing Performance of Trilinguals vs. Bilinguals in Relation to the Interdependence Hypothesis
This study explored the nature of transfer among bilingual vs. trilinguals with varying levels of competence in English and their previous languages. The hypotheses were tested in writing tasks designed for 75 high (N= 35) vs. intermediate (N=40) proficient EFL learners with Turkish, Persian, English and Persian, English linguistic backgrounds. Qualitative data were also collected through some ...
متن کاملText-Mining: Application Development Challenges
This paper reviews the best practices and challenges for project managers and developers involved in implementing text-mining applications. With focus on rule-based information extraction, and references to actual cases, the authors share their experiences from developing several text-mining applications in diverse industries. First, project management issues are discussed, including a process ...
متن کامل